One of our critical assumptions for one sample & two sample t-test is that the distribution of the sample means is normal. We have learned that if the distribution of our sample is normal we are guaranteed that the distribution of our sample means will be normal. Additionally, we have learned that the Central Limit Theorem (CLT) tells us that if our sample size is big enough, the distribution of the our sample means will be normal even if the distribution of our sample (of the population from which we drew this sample) is non-normal.
However, it is often the case in ecology that our samples are very non-normal and the CLT doesn’t apply. This may be because the data are strongly skewed or because there are outliers in the data. In this lab you will learn how to deal with non-normal data and data with extreme outliers.
No matter what, we first have to check whether our data are normal. In Lab 4 you learned how to do this but as a quick recap on the tools to check for normality:
histograms/boxplots: If histogram is not “bell shaped”, or the boxplot median has skewed variation, this is your first indication that your data may be non-normal
qqPlot: If all data do not stay within the confidence band your data may be non-normal
Shapiro-Wilk Test: A quantitative test that tells you whether your data are normal or not
Remember to use all of these tools to determine normality because sometimes not all of them will agree and you will have to use your best judgement on your data distribution.
Outliers can drastically influence the results of t-tests and other parametric tests. It is always import to identify outliers before you begin your analysis. While there are formal methods for determining what constitutes an outlier, a simple method is to make a boxplot of your data. A boxplot will identify any points that do not lie within a given range above or below the median of your data. This range can vary depending on how you build your boxplots, but typically the width of the boxplots is related to the first and third quartiles of your data. In R, the default range is 1.5 times the distance between first and third quartiles.
While sometimes outliers can be erroneous values (i.e. you measured something incorrectly) they often tell you something important about the population you are measuring. Therefore, you cannot drop an outlier from your analysis without having good reason to do so. Good reasons could include measurement error or an underlying difference between that individual and the rest of the population (for example, the 100 m sprint time for an athlete who was taking steroids). Dropping an outlier because it makes your life and the analysis easier is not a good reason…though we all wish it was. However, it is good practice to run your analysis with and without your outlier to determine if/how the outlier changes your results. We will have examples of this in future Lab Exercises, but for now understand why outliers can influence your data distribution.
When you transform your data you apply some function (often non-linear) to your data. Wait… why the heck can you do this? If you think about it, the scale that we use to measure things is completely arbitrary (think about Celsius vs Fahrenheit). We could just as easily measure things in a base 2 system and there would, in principal, be nothing wrong with that. If this helps, you can imagine that by transforming your data you are moving to an alien world where they measure everything on a different scale (e.g. on a log scale). Here are some of the reasons why we transform our data:
Some common transformations and the scenarios in which you may use them. The variable Y represents your original data and Y’ represents your transformed data.
(more transformations can be found in the vocab section)
Transformations are not magic. After you have transformed your data you must retest all of your assumptions. I am going to say this again, you must retest all of your assumptions.. Moreover, transforming your data inherently changes your null hypothesis. For example, if I log transformed my number of flowers variable and then tested whether the means were different between female and male bison, my \(H_0\) becomes that there is no difference in the log-mean.
Any conclusions you make about your data have to be in terms of the transformation. For example, you would conclude that the log-mean weight of female bison was signficantly larger than the log-mean weight of male bison.
If a transformation does not fix your problem (e.g. make your data more normal, reduce the heterogeneity of variance, etc.) you cannot proceed with your parametric analysis. Fortunately, there are some non-parametric options that you can consider. Remember that non-parametric does not assume that the data have any kind of distribution.
Non-parametric tests make no assumptions about the distributions of the data and thus are robust to violations of normality. When our data are not normal and we can’t fix them with a transformation we typically turn to non-parametric tests. You typically do non-parametric tests on your untransformed data. Transforming your data will not change the results of a non-parametric test, but it will change your interpretation of the results. Who wants to talk about medians of square-root transformed data when you don’t have to? Keep in mind that non-parametric tests can be used with data that are normally distributed. However, parametric tests that assume normality have more power, so you should use those if your data are normal.
Some additional tests you can use
# code to run a Welch's test
#t.test(group1, group2, var.equal = FALSE)
You have already learned about these tests in class. They rank your data (e.g. smallest value has a rank of 1 and largest value has a rank of n) and then look at the distribution of these ranks. Note that while you can make similar conclusions with non-parametrics tests, the \(H_0\) and \(H_A\) are not specified in terms of the mean, but rather whether the distributions of the two groups differ in central tendency (i.e. the median).
The null hypothesis for a Mann-Whitney U-test is (assuming the variances are equal):
Note that this is different than parametric tests, which are testing differences in means.
For Exercise 1, you will use the data set parasitoids.
Parasitoids are organisms, often arthropods, that develop inside their
host and kill their hosts when they emerge (think the Alien
movies…creepy). Let’s look at the distribution of a parasitized
planthoppers, Tumidagena minuta, across patches in a
salt marsh.
First things first, explore the data set
# dim(parasitoids) # 84 rows, 3 columns
# summary(parasitoids)
str(parasitoids)
## 'data.frame': 84 obs. of 3 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ patch_number : int 0 1 2 3 4 5 6 7 8 9 ...
## $ parasite_burden: int 3 1 4 2 5 0 1 0 0 0 ...
Next, do some quick visualizations of parasite burdens. What is the shape of the distribution - symmetrical or skewed?
ggplot(parasitoids, aes(x = parasite_burden)) +
geom_histogram(fill = "#fa7f2e", color = "grey90")+
theme_bw() +
labs(x = "Parasite Burden (counts)", y = "Frequency") +
theme(axis.title = element_text(face = "bold", size = 12))# right skewed distribution
Let’s say we wanted to use a statistical test that assumed normality.
Make a (a) qqPlot and run the (b) Shapiro-Wilk
test to further assess the normality of the variable
parasite_burden. Do you think this data would be normal
enough to run a parametric test?
## (a) qqPlot
qqPlot(parasitoids$parasite_burden)# not-normal
## [1] 51 84
## (b) Shapiro-Wilk Test
shapiro.test(parasitoids$parasite_burden)# p <0.02; reject H0; not normal
##
## Shapiro-Wilk normality test
##
## data: parasitoids$parasite_burden
## W = 0.76417, p-value = 2.729e-10
There is a pretty big departure from normality here, so I would not feel comfortable using this data as is. So what are our options? Well, the first one is to try transforming the data. The data are right skewed, so let’s try a log and then a square-root transformation. Remember, after transforming data you must test normality again with qqPlot and shapiro-wilk test.
Exercise 1a. Log transformation
## LOG TRANSFORMATION (Knowledge check: why did we add 1?)
parasitoids$log_parasite_burden <- log(parasitoids$parasite_burden +1) #make a new variable
# check normality with histogram
ggplot(parasitoids, aes(x = log_parasite_burden)) +
geom_histogram(fill = "#fa7f2e", color = "grey90")+
theme_bw() +
labs(x = "Log number of Parasite Burden (counts)", y = "Frequency") +
theme(axis.title = element_text(face = "bold", size = 12)) # not really normal
# check normality with qqPlot
qqPlot(parasitoids$log_parasite_burden) # sorta normal
## [1] 51 84
# check normality with Shapiro-Wilk test
shapiro.test(parasitoids$log_parasite_burden) # p < 0.05; not normal
##
## Shapiro-Wilk normality test
##
## data: parasitoids$log_parasite_burden
## W = 0.95158, p-value = 0.003195
Exercise 1b. Square root transformation
## SQRT TRANSFORMATION
parasitoids$sqrt_parasite_burden <- sqrt(parasitoids$parasite_burden) #make a new variable
# same steps for checking normality
# histogram
ggplot(parasitoids, aes(x = sqrt_parasite_burden)) +
geom_histogram(fill = "#fa7f2e", color = "grey90")+
theme_bw() +
labs(x = "Log number of Parasite Burden (counts)", y = "Frequency") +
theme(axis.title = element_text(face = "bold", size = 12))# not really normal
# qqPlot
qqPlot(parasitoids$sqrt_parasite_burden)# kinda normal
## [1] 51 84
# Shapiro-Wilk Test
shapiro.test(parasitoids$sqrt_parasite_burden) # p <0.05; not normal
##
## Shapiro-Wilk normality test
##
## data: parasitoids$sqrt_parasite_burden
## W = 0.94814, p-value = 0.002004
We see that neither log-transforming or sqrt-transforming our data made it normal.Therefore, a non-parametric test might be useful.
For Exercise 2, you will be using the data set cricket.
The sage cricket has an unusual form of mating. During mating, the male
offers his fleshy hind wings to the female to eat. The wounds are not
fatal, but a male with already nibbled wings is less likely to be chosen
by females he meet subsequently. Females get some nutrition from feeding
on the wings, which raises the question, “Are females more likely to
mate if they are hungry?” Johnson et al. (1999) answered this question
by randomly dividing 24 females into two feeding groups:
one group of 11 females was starved for at least two days and another
group of 13 females was fed during the same period. Finally, each female
was put separately into a cage with a single (new) male, and the waiting
time_to_mating was recorded.
Step 1. Check assumptions for untransformed data
# check normality with histogram
ggplot(crickets, aes(x = time_to_mating)) +
geom_histogram(fill = "#2a9d8f", color = "grey90")+
theme_bw() +
labs(x = "Time to Mating (seconds)", y = "Frequency") +
theme(axis.title = element_text(face = "bold", size = 12)) #not normal
You should still check normality w/ qqPlots, Shapiro-Wilk test but we’re going to skip those parts in this example and assume the assumptions have been violated.
Step 2. Transform data & re-check assumptions
# try log transformation
crickets$log_time <- log(crickets$time_to_mating+1)
# format data to look at distribution of both groups
starved <- crickets %>%
filter(feeding == "starved")
fed <- crickets %>%
filter(feeding == "fed")
## check assumptions on transformed data
# Normality assumption
shapiro.test(starved$log_time) # p > 0.05 now normal
##
## Shapiro-Wilk normality test
##
## data: starved$log_time
## W = 0.96151, p-value = 0.7903
shapiro.test(fed$log_time) # p < 0.05 but closer to .05 so we'll say it's normal based on other normality tests
##
## Shapiro-Wilk normality test
##
## data: fed$log_time
## W = 0.85511, p-value = 0.03323
qqPlot(fed$log_time) # looks normal
## [1] 1 2
hist(fed$log_time) # not perfect, but more normal than untransformed time
# Variance assumption
leveneTest(time_to_mating ~ feeding, data = crickets) # p < 0.05 unequal variances
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 4.5709 0.04388 *
## 22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
leveneTest(log_time ~ feeding, data = crickets) # p > 0.05 equal variances
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 2.6901 0.1152
## 22
In the above example, our normality tools disagreed with each other but because the log transformed data was more normal than our untransformed and close to our 0.05 cut-offs, I will say the log transformation worked and we can proceed with the parametric two sample test. For your final project you will probably want to be more strict about this, but for the sake of this exercise sample, the log transformation worked.
Step 3. Re-test the data and report on the LOG transformed time
t.test(starved$log_time, fed$log_time, var.equal = TRUE)
##
## Two Sample t-test
##
## data: starved$log_time and fed$log_time
## t = -0.83586, df = 22, p-value = 0.4122
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4712919 0.6259975
## sample estimates:
## mean of x mean of y
## 2.507106 2.929754
In Lab section, go over the steps on how to report results from your t-test including writing the null & alternative hypothesis, concluding what your p-value means in terms of the null hypothesis (i.e fail or reject to fail), report 95% CI and theLOG means of both sample groups. After stating all of this information, you can then conclude that “there is not a significant difference of the LOG mean of time to mating between starved or fed crickets (t = -0.84, df = 22, p-value = 0.41).”
For Exercise 3, you will use the data set heat_walk.
This is a subset of data from the paper titled, ” Evidence of
alliesthesia during a neighborhood thermal walk in a hot and dry city
(Phoenix, Arizona)” by Dzyuban et al 2022, which is cited below.
“Designing cities for thermal comfort is an important priority in a warming and urbanized world. As temperatures in cities continue to break extreme heat records, it is necessary to develop and test new approaches capable of tracking human thermal sensations influenced by microclimate conditions, complex urban geometries, and individual characteristics in dynamic settings.” From this study, the authors measured alliesthesia (a medical symptom of feeling (dis)pleasure) in pedestrains as they walked along urban streets on a hot day. The authors found that even small changes in microclimate and shade were felt by participants.
For Exercise 3, we will be looking at the differences in wind speed
(mph, ws) at two stops where alliesthesia
measurements occurred. The two stops were chosen based on locations that
varied in PET (C). On average, Stop 3 had higher PET (C) than Stop 7,
and was in a less developed part of Phoenix.
Step 1. Check normality and variance assumptions of data
# Format data to look at distribution of both groups
hot <- heat_walk %>%
filter(stop == 3)
cool <- heat_walk %>%
filter(stop == 7)
## Normality
ggplot(heat_walk, aes(x = ws)) +
geom_histogram(aes(fill = as.factor(stop)), color = "grey1", alpha =.5)+
theme_bw() +
labs(x = "Wind Speed (mph)", y = "Frequency", fill = "Stop Station") +
theme(axis.title = element_text(face = "bold", size = 12)) # not normal
shapiro.test(hot$ws) # p < 0.05
##
## Shapiro-Wilk normality test
##
## data: hot$ws
## W = 0.88838, p-value = 4.226e-10
shapiro.test(cool$ws) # p < 0.05
##
## Shapiro-Wilk normality test
##
## data: cool$ws
## W = 0.87902, p-value = 1.307e-10
qqPlot(hot$ws) # not normal
## [1] 24 25
qqPlot(cool$ws) # not normal
## [1] 120 59
## Variances
leveneTest(heat_walk$ws ~ as.factor(heat_walk$stop)) # leveneTest won't run if data types are not factors # unequal
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 30.377 6.988e-08 ***
## 344
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the example above, we did not try a transformation because this particular data set looked non-transformable. However, in your homework questions you will always have to go through the steps of transforming your data and re-checking the assumptions on the transformed data as we did in prior exercises.
Step 2. Run Wilcoxon rank sum test
wilcox.test(heat_walk$ws ~ heat_walk$stop) # p < 0.05
##
## Wilcoxon rank sum test with continuity correction
##
## data: heat_walk$ws by heat_walk$stop
## W = 5672.5, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Based on the above results, we can conclude that stop 3 and 7 do not have equal wind speeds (W = 5672.5, df = 1, p-value < 2.2e-16). Note that the W (or U) statistic is analogous to a t or Z statistic - it comes from a distribution from which you can determine its associated p-value.
Since the Wilcoxon rank sum test (& Mann-Whitney U-test) are comparing distribution or medians, and not means, you will need to do an extra step to find the central tendancy of each group.
heat_walk %>%
group_by(stop) %>% # group by stop 3 or stop 7
summarise(median = median(ws, na.rm = FALSE)) # make sure to get rid of NAs
## # A tibble: 2 × 2
## stop median
## <int> <dbl>
## 1 3 0.8
## 2 7 2.1
From this, we can see that stop 7 has a higher median wind speed of 2.1 mph and stop 3 has a lower median wind speed of 0.8 mph. Perhaps the increased wind speed at stop 7 is why pedestrians reported feeling slightly cooler at that stop compared to stop 3.
For your Final Project, you will be responsible for creating a professional looking report about a biological system you investigated using statistical tests and R skills you learned throughout EEMB 146. Some additional skills that will be helpful are:
You have learned so much in Week 4 & 5! The concepts of hypothesis testing, assumption checking, (non)parametric tests is a core part of the Biometry curriculum. If you are still unsure about these concepts it is highly recommended that you attend Lab Discussions, especially before the midterm.
(read through the vocab section in Exercise 5)
Match the correct test (one sample, paired, two sample, Welch’s, Mann-Whitney U, Wilcoxon rank sum) with the situation from a-g
# untransformed
hist(parasitoids$parasite_burden)
min(parasitoids$parasite_burden) # 0
## [1] 0
# log transform, no add 1
parasitoids$log_no1_count <- log(parasitoids$parasite_burden)
hist(parasitoids$log_no1_count)
min(parasitoids$log_no1_count) # -Inf
## [1] -Inf
# log transform, add 1
parasitoids$log_count <- log(parasitoids$parasite_burden + 1)
hist(parasitoids$log_count)
min(parasitoids$log_count) # 0
## [1] 0